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FOREWORD 


This Indian Standard was adopted by the Bureau of Indian Standards, after the draft finalized by the Smart 
Infrastructure Sectional Committee, had been approved by the Electronics and Information Technology Division 
Council. 

The Composition of the panel, LITD 28/P7 and the sectional committee, LITD 28 responsible for the formulation 
of this standard is given at Annex B. 
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INTRODUCTION 


Smart cities vision is to use digital technologies to provide integrated services to its citizens through free flow of 
information, to usher in an era of transparent governance. Designing smart cities ICT architecture is the essential 
first step in this direction. Cities are complex ecosystems, where government services pertaining to transportation, 
public safety, utilities, healthcare, education, social services, culture, economic development and more are 
provided by a multitude of government organizations. 


A simplified “Technology Reference Model (TRM)" of Smart Cities ICT Architecture is shown in Figure 1. Each 
Application/Component part of the Smarts Cities ICT Architecture deals with data and hence needs to align with 
principles defined in this document. 


Application © 
Services and 
Components 


A 
Enables 


Works with Supports 
| 


ICT © 
Supports— Infrastructure 


Fic. 1 TECHNOLOGY REFERENCE MODEL 


Data is generated by a variety of smart city applications, operated and managed by a host of departments and 
organizations, working towards a common goal of building and running city infrastructure to better serve the 
citizens. However, this multiplicity of data owners often causes problems related to accuracy, consistency and 
accessibility of right data at the right time. 


There is a need to bring together a large amount of data available in cities, including energy, traffic and transport, 
parking, environment, ERP, water, solid waste, crowdsourced data, etc., curate data (by eliminating duplicate, 
invalid, outdated and wrong data) and provide a holistic view of the information with the aim of improvement 
and development of innovative smart city services. Integrated data plays a vital role in understanding the problem 
in the right context and providing a solution which is in the interest of administration as well as citizens. Trend 
analytics over weekly, monthly, and annual views of the data reveal insights and surprises that daily views cannot. 
In business terms, the key performance indicators of the city's health, progress, and objective results are found in 
the long-term analysis of data. 


Smart City data management will thus enable and stimulate a proper understanding of how a city's infrastructure 
is utilized in different domains, what key performance parameters they indicate, the interdependencies between 
different elements of city infrastructure and the effects of external drivers like public policy, major events, 
exigencies, and weather. 


Smart city data exhibits characteristics of 5 Vs namely Volume, Velocity, Variety, Veracity and Value which comes 
with their own individual challenges. 


Volume: Data volume represents the extensive amount of data available for analysis to extract valuable information. 
An example of high volume of data, is the volume of data generated through surveillance cameras and sensors 
deployed across the city. 


Velocity: Most of the real-world control applications need actionable insights in a real time basis. Streaming and 
Real time analytics utilize high velocity of data to generate real time operational alerts and insights. One example 
of such data is the real-time position of a utility vehicle plying in the city sending updates on its position every 
5 seconds. 


vii 


IS 18002 (Part 1) : 2021 


Variety: Variety in data arises due to the variety of sensors and systems deployed in the cities. Also, variety in 
data arises due to the same type of device or systems generating data in heterogeneous formats or recording data 
in different units of measurements. 


Veracity: Veracity refers to the biases, noise and abnormality in data. Noise, abnormality in data is a major issue 
in smart city deployments. Many complex systems utilize AI models for pre-processing and filtering of data, and 
Al is prone to biases induced due to noise and abnormality in data. Additionally, measurements by sensors also 
suffer from drift due to various physical factors. Controlling veracity requires constant observations of the data, 
trends in time series data and taking timely remediations. 


Value: Data 1s only as valuable as the business outcomes it makes possible. Smart cities are looking at data 
monetization and innovation on open data for future infrastructure development. For this, smart cities need to 
make choices on storage (long term vs short term), types of data to be ingested, governance policies and security 
controls to be implemented on data to ensure that data is usable on a longer term. 


The future of governance is data driven. Cities have begun to adapt to this change in their functioning. This 
data-driven change adapted by cities, apart from providing timely inputs on the impact of citizen services is also 
used to measure the impact of the investments made over a period. This helps realistically assess the gaps between 
the outcomes and the desired goals. Data is an asset for cities, hence has a specific, measurable value for the 
Government and needs to be managed accordingly. 


To overcome data integration issues, a city needs to have a robust Unified Data Architecture Framework that 
puts in place a mechanism to not only share the data amongst different departments but also a set of tools and 
technology to better use this data for decision making. 


This standard defines the conceptual model (functional and technological), data principles and reference 
architecture for data in a smart city. A common data reference architecture spanning all the smart cities will help 
in bringing data in *focus' to ensure a move towards outcome-based planning in governance. 


The Data Layer Reference Architecture is intended to be used by stakeholders such as Smart city Data Officers, 
Smart City CEO's, Policy and Governance officers, Auditors and other stakeholders involved in smart city 
implementation to define the city's data architecture goals and to roll out solutions adhering to the defined goals. 
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Indian Standard 


UNIFIED DIGITAL 
INFRASTRUCTURE — DATA LAYER 


PART 1 REFERENCE ARCHITECTURE 


1SCOPE 


This Standard describes the data layer reference 
architecture that comprises the key data principles that 
every smart city sub-system needs to adhere to, the core 
capabilities required to be implemented at the city level 
for realizing the data layer, the functional reference 
model and the technology reference model. 


This standard applies to data generated in smart cities 
across following streams: 


a) Demand-side stream which can give better 
understanding of specific properties and 
characteristics of urban processes, e.g. buildings 
services, government-to-citizens services, and 
provide solutions for improvement. 


b 


— 


Supply-side stream to monitor incidents and 
crisis situations and the respective responses and 
solutions with the aim of drawing conclusions and 
recommendations. 


c 


— 


Analytical stream to identify data patterns and 
correlations in order to derive predictions for 
urban innovation, provide impact assessment, and 
demonstrate the challenges and opportunities in 
urban development. 


d 


wa 


Standardization stream to bring the city data 
in line with the international standards like 
ISO/TS 8000 Data Quality. 


While the technical reference architecture lists a set 
of technology components and provides an overview 
of each of them, individual choice of technology 
components can vary from city to city while keeping 
the core data capabilities that should be fulfilled by the 
city. 


2 REFERENCES 


The standards given below contain provisions which, 
through reference in this text, constitute provisions of 
this standard. At the time of publication, the editions 
indicated were valid. All standards are subject to 
revision, and parties to agreements based on this 
standard are encouraged to investigate the possibility 
of applying the most recent editions of these standards.’ 


a) IS 18000 : 2020 Unified Digital Infrastructure — 
ICT Reference Architecture. 


3 TERMINOLOGY AND SYMBOLS 


3.1 Terminology 


For the purpose of this standard, the definitions given in 
IS 18000 shall apply, in addition to the following: 


3.1.1 Access Control 


A process by which use of system resources is regulated 
according to a security policy and is permitted only by 
authorized entities (users, programs, processes, or other 
systems) according to that policy. 


3.1.2 Architecture View 

Work product expressing the architecture of a system 
from the perspective of specific system concerns. 

3.1.3 Architecture Viewpoint 


Work product establishing the conventions for the 
construction, interpretation and use of architecture 
views to frame specific system concerns. 


3.1.4 Conceptual Model (CM) 


Describing the key concepts characterizing Smart City 
data layer architecture. 


3.1.5 Data 


Re-interpretable representation of information in 
a formalized manner suitable for communication, 
interpretation, or processing. Note 1 to entry: Data 
can be processed by humans or by automatic means. 
[SOURCE: ISO/IEC 2382 : 2015, 2121272]. 


3.1.6 Data Lake 


A large repository composed of different kinds of raw 
and processed data, supporting multiple storage engines 
including storing and processing of object storage. 


3.1.7 IoT 


Internet of Things — Interconnection of computing 
devices embedded in everyday objects using the 
Internet backbone enabling them to send and receive 
data and thereby participate in generating information. 


3.1.8 Logical Model 


A logical data model or logical schema is a data model 
of a specific problem domain expressed independently 
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of a particular database management product or storage 
technology but in terms of data structures such as 
relational tables and columns, object-oriented classes, 
or XML tags. 


3.1.9 Metadata 


A set of data that describes and gives information 
about other data. Information of data like descriptive, 
structural, administrative, reference, statistical and legal 
information of other data [SOURCE : ISO/IEC 11179]. 


3.1.10 Physical Model 


Physical Model describes the database 
implementation of the Logical data model. 


specific 


3.1.11 Reference Model (RM) 


Providing the overall structure of the elements of the 
architecture. 


3.1.12 Demand-side stream 


The data collected from various sensors, systems and 
sub-systems for a specific purpose. 
Example: ON/OFF status of streetlights, environment data 
from sensors etc. 
3.1.13 Supply-side stream 
The data flowing to various sensors, systems and sub- 
systems for a specific purpose. 
Example: rule-based alerts sent to streetlight systems, 
notifications on PA systems etc. 
3.1.14 Analytical stream 
The data generated within the city, using existing data 
from demand and supply side streams. 


NOTE — Analytical stream data helps to identify patterns 
and correlations for modeling and analytical purposes. For 
example, data models, data APIs etc. 


3.2 Symbols and Abbreviations 


5V's Volume, Velocity, Variety, Veracity and 
Value 

API Application Programming Interface 

AMQP Advanced Message Queuing Protocol 

CKAN Comprehensive Knowledge Archive 
Network 

DKAN Drupal based Knowledge Archive 
Network 

ESB Enterprise Service Bus 

FAAS Function as a Service 

HIPAA Health Insurance Portability and 
Accountability Act 

HTTP Hyper Text Transfer Protocol 

ICT Information and Communication 
Technologies 


IIoT Industrial Internet of Things 

MQTT Message Queue Telemetry Transport 

NoSQL Data storage systems like document 
storage model, columnar storage & object 
storage model with alternate data retrieval 
methods which are not SQL. 

OCF Open Connectivity Foundation 

OGPL Open Government Platform 

OLAP Online Analytical Processing 

OLTP Online Transaction Processing 

OWL Web Ontology Language 

PCI Payment Card Industry 

QoS Quality of Service 

SOAP Simple Object Access Protocol 

SQL Structured Query Language 

RDF Resource Description Framework 

REST Representational State Transfer 

W3C World Wide Web Consortium 

XML Extended Markup Language 

UML Unified Modelling Language 

4 UDI - CORE DATA PRINCIPLES AND 
CAPABILITIES 


This section describes core principles and capabilities 
that are required to be implemented by a Smart City as 
part of a fully functional data architecture framework. 


4.1 Core Data Principles 
4.1.1 Data Principles 


To achieve the key goal of establishing a consistent 
data architecture framework within a smart city, each 
system deployed in the city shall adhere to the data 
principles/expectations defined in Table 1. 


4.1.2 Open Data Principles 


The principles given in Table 2 apply only to data for 
systems that exchange or publish open data in smart 
city data portals: 


In addition to the principles listed in 4.1.1 and 4.1.2, 
data platforms, services and applications in Smart 
City shall follow all key principles described in 
IS 18000 : 2020. 


4.2 Core Capabilities 


The set of core capabilities shown in Fig. 2 and 
defined in 4.2.1 to 4.2.12 help the cities understand 
the requirements brought in due to the necessity of 
managing the data across its life cycle and being data 
driven in decision making. Also, these core capabilities 
help cities in identifying the current gaps with respect 
to the implementation of a well-defined data layer. 


ID 


Data Principle 
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Table 1 Data Principles 


Description 


PI 


P2 


P3 


P4 


P5 


P6 Machine Readable 


Accuracy 


Completeness 


Timeliness 


Privacy 


Confidentiality 


The degree to which data has attributes that correctly represent the true value of the intended attribute of a 
concept or event in a specific context of use. It has two main aspects syntactic accuracy and semantic accuracy. 


Smart City Data stored shall be as accurate as possible for an object, whereby the object shall have the right 
values and shall be represented in a consistent and unambiguous form in alignment with known frameworks 
such as OWL and RDF when possible and appropriate. 


The degree to which subject data associated with an entity has values for all expected attributes and related 
entity instances in a specific context of use. 


Smart City data shall reflect what is recorded based on a standard schema that defines completeness. Metadata 
that defines and explains the raw data should be included with explanations and formulas for how data was 
derived and calculated. 


Smart city data shall be available in a timely fashion. They shall be made available as quickly as they are 
collected and processed, based on data priority defined according to the time sensitivity of utility and value. 


Itis a set of shared values governing the privacy protection of the personally identifiable information over its 
lifetime as applicable for city data. Data privacy controls as per the law ofthe land and city data policy should 
be adhered to. 

Wherever applicable domain specific privacy principles should be adopted by Smart Cities (e.g. PCI, HIPAA). 


The degree to which data has attributes that ensure that it is only accessible and interpretable by authorized 
users in a specific context of use is disseminated through applicable data fiduciaries. 


Wherever applicable domain specific access control policies shall be adopted by Smart Cities. 
Smart city data should be stored in widely used file formats that easily support machine readability, 


interpretation and processing. Files should be accompanied by documentation related to the format and how 
to use it in relation to the data. 


P7 Non-Redundancy Smart city data should be acquired, stored in a timely manner, and made available for multiple or generic 
purpose reuse to avoid data duplication, and promote data consistency and quality. 
P8 Permanence Smart city data released for online consumption should be available in archives and in perpetuity as defined in 
the policy for the type of data. Deletion of city data should be as per city data policy. 
P9 Consistency Smart city data should be consistent across different systems. 
a) Data written to the storage must be valid according to all defined rules, including constraints, cascades, 
triggers, and any combination thereof. 
b) When data is aggregated from multiple sources there shall be consistency in the measurement of 
variables throughout the datasets. 
P10 Non-Repudiation Data shall include source information like the owner, device which generated the data and store the hash 
values computed on the original data to cater to non-repudiation 
Table 2 Open Data Principles 
ID Open Data Principle Description 
OPI Non-Discriminatory ^ Access to Smart City Open data by the public should not contain any barriers to use. Any request 
for access to data shall not be profiled based on race, gender, caste, religion or nationality of the 
requester. 
OP2 Non-Proprietary Smart city open data should be made available in alternative formats to allow the public to avoid the 


cost of consuming data in specific formats. Data may be made available for free or on chargeable 
basis as per the City Data Policy. 
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Data 
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Fic. 2 SMART City DATA ARCHITECTURE FRAMEWORK — CORE CAPABILITIES 


4.2.1 Data Governance 


Data governance is at the centre of data management 
activitles in a Smart City where different types of 
stakeholders are involved and is a horizontal capability 
that shall be implemented by every smart city. 


Data governance is a larger framework consisting of 
policies, rules, people, processes and tools to define the 
usage, transfer, retention, security and privacy of data 
across organizations, departments and systems in cities. 


Policies shall be enforced while ingesting, processing, 
storing, publishing and archiving of data. This also 
involves overseeing, tracking and managing data 
related projects and services. 


Three key elements to data governance are: 


a) Policy Definition — Standards driven language to 
document the policy. 


b) Access Authorization — Access to data based on 
the defined policy. 


c) Policy Enforcement — Agent to enforce the access 
policies. 


Policy definitions consist of information about data 
access policies, data ownership and metadata about 
the data, which should be captured as part of the data 
semantics and published in the catalogue for different 
data that is being ingested into the smart city data lake. 


The need is for a data governance system that offers 
fine-grained authorization policy creation, management 
along with a policy enforcement agent which will 


use the policies defined in the catalogue to enforce 
the policy within the complex ecosystem of the 
perimeter-less enterprise network. There can be no 
difference or leniency in access policies whether the 
application is hosted externally or internally to the data 
platform. 


While it is possible to define and enforce these policies 
and frameworks by creating paper-based records or 
even in electronic format, a software-based policy 
engine can go a long way in ensuring cities to automate 
the implementation of these policies, standards and 
processes. Cities shall create a city specific data 
governance plan with the following objectives: 


a) Development of Standard Operating Procedures 
(SOPs) and city specific Data Policies including 
open data. 


b) Publishing of data on smart cities Open Data 
Portal. 


c) Supporting in the transformation of Special 
Purpose Vehicles (SPV) and relevant line 
departments into digital organizations. 


The Smart City Data Governance shall ensure that 
agencies define the following data policies and 
standards given below (including but not limited to): 


a) Data privacy policy: describing the city 
administrations obligations towards protecting 
the privacy rights of citizens, administrators and 
stakeholders in line with the city data policy and 
in conjecture with the National Data Sharing 


and Accessibility Policy (NDSAP) and other 
guidelines issued/endorsed by Govt of India. 


b) Data retention/archival policy: describing how 
long data should be kept and the conditions for 
data archival or retirement. 

c) Open data policy: describing when, how, type, 
granularity of data and the frequency with which 
the agency should publish its data for public 
consumption. 


d) Data classification standards: describing the 
criteria for categorizing data into different access 
levels depending on need-to-know basis and the 
risks associated with getting the data into the 
wrong hands. 


e) Domain-specific data standards: describing 
domain-specific controls for collecting, organizing 
and managing data. 


4.2.2 Data Quality 


Data quality is the ability of data to satisfy the stated 
business, system, and technical requirements of the 
Smart city. Data quality is very important as similar 
information about citizens namely-Address, Phone 
Number, etc, are collected across multiple applications 
and most times it 1s entered manually. Also, when 
application forms are filled by citizens or operators 
there's always the possibility of missing data in some 
fields. Data quality can be managed using the following 
steps: 


a) Data Cleansing: Maintenance of data to fit 
defined Smart City Data Standard for enhanced 
interoperability and decision making. 


b) Data Profiling: Systematic analysis of data to 
gather actionable and measurable information 
about its quality. 


c) Data Traceability: Tracking of the lifecycle of 
data to determine and demonstrate all changes and 
access to the data. 


d) Data Compliance: Ongoing processes to ensure 
adherence of data to both enterprise business 
rules, and, especially, to legal and regulatory 
requirements. 


e) Data Monitoring: Routine checking and validation 
of data against quality control rules to ensure 
quality and format consistency. 


4.2.3 Data Flow Architecture 


Data is exchanged between multiple smart city 
systems, any change in the data structure in one system 
will affect all the upstream systems which consume the 
data. Hence it is important to document the data model 
or database schema of each individual system and the 
data flow between systems. 


System vendors delivering smart city solutions shall 
document the data schema or data models and APT's to 
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access data and publish it in repositories identified by 
the smart city. System Integrators shall document the 
data integration architecture, data flow and dependency 
between systems. 


Smart cities shall identify the tools & frameworks to 
manage the database schema changes and use standard 
formats such as SQL or XML to manage the database 
schema changes. Data Flow and Data Integration 
Architecture may be documented using standard UML 
based notations and artefacts. 


4.2.4 Data Processing 
The core data processing functions are: 
a) Data Transformation; 
b) Data Normalization; 
c) Data Annotation; 
d) Data Linking; and 
e) Data Auditing. 


4.2.4.1 Data transformation 


Data transformation is the ability to apply filters, 
aggregates, summarization utilizing data models and 
semantics from metadata defined in master or reference 
data stores. Data transformers shall be able to handle all 
forms of data namely instantaneous and historical data, 
incremental data (versioned data), textual and binary 
data. 


4.2.4.2 Data normalization 


Raw data which does not align with defined data models 
or schemas need to be normalized in data stores. Data 
schemas help in validating the data ingested according 
to defined rules and transformers can use the schemas 
to convert data from unstructured form to structured 
form. 


Data semantics is defined as data about data (or) 
meta-data about data. Data semantics can be used 
to describe the kinds of entities, the classifications 
and groupings of those entities, and the structural 
interconnections among them apart from the field 
information and units. 


Data semantics tools offer a collection of high-level 
modelling primitives to capture the semantics. By 
analyzing the semantic models of data generated by 
different smart city component systems the city can 
undertake data consolidation activities. 


4.2.4.3 Data linking 


Smart city data is heterogeneous in nature, heterogeneity 
arises due to differences in data format (text, video, 
binary etc), point of origin (devices, social platforms, 
citizen apps, enterprise systems), periodicity and 
quality of data (raw, annotated). 


To realize the potential of the data captured, it is 
imperative to combine data generated from different 
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modalities. In order to combine the data, there is a 
need to understand the data captured and document 
the semantics. Documenting the semantics of the data 
using ontology languages is one such technique to 
describe the data. 


At the sensor level W3C has created an incubator group 
Semantic Sensor Network (SSN) to describe sensor 
level ontology. There are also domain specific industry 
consortia like OCF, IIOT which are working towards 
creating a standardized ontology for IoT domains. 


4.2.4.4 Data annotation 


While raw data is important there is a cost associated 
with acquiring data and it is not possible to always 
create sensors or systems for capturing all possible 
data. It has been shown that enriching raw data 
using crowdsourcing, machine learning techniques 
in interesting ways can lead to substantial value 
generation. 


As an example, OBD data from each vehicle in a fleet 
is used to create trip information data, which can then 
be annotated by integrating with social platforms where 
users can augment or annotate the data with their likes 
and/or dislikes, timeliness of service, tag/group routes 
of interest. All these additional data created on top of 
the trip data derived from raw sensor data can then 
be used to build interesting mobility use cases for the 
smart city. 


In addition, smart city component systems can also use 
Artificial Intelligence (AI) and machine learning (ML) 
techniques to annotate data for example augmenting 
quality of raw image/video data with information such 
as object classification, object identification, object 
counting, scene understanding, trip summarization etc. 


4.2.4.5 Data auditing 


Auditing as part of the data lifecycle management is an 
important sub-function which enables. 


a) Track historical changes of data. 


b) Enable logging of services, users and requests for 
data. 

c) Apply soft and or hard deletions of data. 
Data Confidentiality: Maintaining confidentiality 
of data through well-defined access policies is to be 
enforced across all the data processing stages. There 
are a variety of access control mechanisms to provide 
access to resources to authorized applications and/or 
users, some of the most common access control models 
are: 


a) Mandatory Access Control. 

b) Discretionary Access Control. 
c) Role Based Access Control. 

d) Attribute Based Access Control. 


Smart cities should choose one or more access control 
mechanisms, document and enforce them as part of the 
data governance processes. 


4.2.4.6 Data integration 


Smart cities shall be open to accept data from multiple 
sources. Hence data Integration is one of the core 
capabilities that need to be implemented by every smart 
city. 

Based on the system of concern smart cities may 
implement real time data ingestion, streaming data 
ingestion, batch data ingestion or all ofthem as required. 


As part of Data Integration, a city may deploy ETL/ELT 
(Extract-Transform-Load or Extract-Load-Transform) 
tools to help in transforming data from multiple 
sources into normalized data models and according to 
the shared data semantics. This will help in improving 
the data interpretability across smart city domains and 
thereby improve the insights derived from data. 


Data integration tools shall also be used to translate 
the data aiding in system integration (source system to 
destination system). Two types of translation are listed 
below for reference: 


a) Translation of internal service and data formats for 
destination system compatibility and consumption. 


b) Translation of internal protocols for destination 
system compatibility and consumption (e.g. SOAP 
to REST). 


4.2.5 Data Storage 


Managing the entire lifecycle of data also means that 
city administrations need to make decisions on the 
persistence and longevity (duration) of the storage of 
data. Cities shall identify strategies on storage of both 
structured and unstructured data and utilize tools and 
technologies such as Data Warehouse and Data lakes to 
store data for long term usage. For example, structured 
data generated by various enterprise applications such 
as Solid waste management, e-governance, Parking, 
Traffic violations & Challans etc. can be stored in 
a data warehouse for long periods for cities to build 
trends & patterns and derive correlations. Similarly, 
unstructured/streaming data generated by surveillance 
video cameras or other sensors if stored as is can add 
to humongous storage costs. However, certain slices 
of the video data which are annotated for security 
violations may be stored in a data lake till the duration 
as legal requirements mandate. 


City administrations shall analyze and document 
storage strategies (persistent vs transient), data archival 
and purging strategies prior to system deployment. 
Investments in the right set of tools & technologies are 
key for service continuity along with compliance while 
keeping the costs under control. 


Cities should closely govern the creation, storage, 
maintenance, usage and disposal of Smart City data 
following data standards in alignment with ethical 
data handling principles like DMBoK2 and ensure 
adherence across agencies for storage and retention. 


4.2.5.1 Creation 
a) Data is available when needed. 
b) Data used are understandable and clear. 


c) Data is accessible to all members of the intended 
audience to conduct day to day business activities. 


d) Data created is trusted, accurate and as complete 
as possible. 


e) Data shall be backed up based on KPI's and 
criticality of data. Key KPI’s like point in time 
recovery (PITR) and mean time to recover 
(MTTR) help in defining the backup schedule 
which helps to mitigate the risk of data loss. 


4.2.5.2 Storage and retention 


a) Data is stored securely at rest using encryption 
techniques and hashing algorithms as defined in 
the city data policy. 

b) Data and document retention shall be retained as 

long as agencies deem their current usefulness and 


historical relevance. 


wm 


c 


— 


Data shall be maintained and replicated in storage 
at primary and disaster recovery sites as defined 
in city data policy and as required to support non 
functional requirements such as high availability. 


d 


wa 


Data retention must satisfy any current data 
retention policy. 


4.2.5.3 Usage and maintenance 


a) Usage of data stored and retrieved should be 
consistent for the purpose for which the data is 
intended. 


b) Data retrieval requests and fulfilment should be 
reviewed and monitored for adherence to current 
security policies and standards. 


4.2.5.4 Retirement and disposal 


a) All data nearing end-of-life shall be first retired to 
a secure offline or near-line storage repository for 
cleansing. 


b) Disassociate metadata from the data to remove 
identification. 


c) Set up a plan to temporarily remove data for 
retirement to test the impact. 


d) All data being disposed of must be documented. 


e) Data owners must certify no impact of the disposal 
of data from their respective systems prior to 
removal. 


4.2.6 Data Exchange 


Data Exchange pertains to sharing ingested and 
analyzed City data via Open Data sets, Data APIs, 
Dashboards, as well as Sandbox for working on open 
data sets with various stakeholders based on city data 
policy. 
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a) Open Data Management: Management and 
accessibility of open datasets in usable formats and 
enablement of query generation on these datasets. 


b) Data Visualization: Manipulation and placement of 
data in a visual context such as infographics, dials 
and gauges, geographic maps and charts. 


c) Data Dashboards: Integration of information from 
multiple components into a unified display to 
facilitate analysis and gaining insights through data. 


d) Data Sandbox Environment: Isolation of computing 
environment in which a program or file can 
be executed without affecting the production 
environment of the services. 


4.2.7 Data Protection 


Data protection is applicable throughout the “collection 
to exchange to retention” lifecycle of the data in a city. 
While implementing data protection, the city must keep 
in mind that they should be fair in data collection, use 
the data for specific purposes only and collect only the 
information needed for those defined purposes, keep the 
data accurate and only for as long as it is needed. The city 
should ensure that data is always safe and secure. 


National Security Council has published a Model 
Framework for “Cyber Security Requirement for Smart 
City”. Cities shall adopt the recommendations from the 
model framework into their city data policy as part of 
their cyber security plans for protecting the data. 


4.2.8 Big Data Management 
Smart City Data architecture should be effectively 
designed (e.g. ingestion, data storage, cleansing) to handle 
5V’s of big data, enabling easy access and processing of 
data by different users and systems. 
a) Big data management should be designed to manage 
structured and unstructured datasets. 
b) Distributed database file system technologies 
(DDFS) shall be deployed to host and manage large 
volumes of unstructured data. 


c 


— 


Big data streaming technologies shall be deployed 
to handle the high frequency of incoming data that 
needs to be collected, aggregated, and processed in 
batch and in real-time. 


d) Common data models shall be deployed to prevent 
unnecessary data transformation and to overcome 
application level semantic differences. As such, 
data exchanges would use the schemas that codify 
canonical data models. 


4.2.9 Master Data Management 


Master data management is a collection of processes 
documenting various data and its sources using tools and 
technology to ensure that there is one source of truth for 
master data across the smart city systems. 


Master data management is the process of creating a 
centralized database of data entities used by multiple 
applications across the city. Examples of master data 


IS 18002 (Part 1) : 2021 


in a city could be addresses (property, buildings, road 
names, locality names, etc), Assets (Fixed assets like 
real estate, plant and machinery or movable assets like 
vehicles), Citizens (or customers), and many more. 
This practice helps meet the data quality requirement 
as these central databases act as the single source of 
truth for all applications. 


Master Data Management includes the following: 


a) Collection of data from various sources and 
identifying the golden source for a particular 
master data. 


b) Reconciliation of inconsistencies in master data, 
identification of most accurate, timely and relevant 


data. 

c) Maintaining the quality of reference and master 
data. 

d) Sharing of the master data efficiently and 
effectively. 


e) Promote reuse of master data across systems. 


Annex A provides a Master data management reference 
implementation guide that can be used to identify the 
system that can act as the golden source of truth for a 
given data. 


4.2.10 Analytical Modelling 


Cities needto provide mechanisms to data science teams 
for Analytical or AI model creation and for cognitive 
computing capabilities to generate data insights. 


a) Model Building: Analysis and generation of 
mathematical representations of the system and its 
services, including the statistical models used to 
understand behaviours and patterns. 


b) Model Deployment: Deployment of models in an 
automated fashion, without the need for human 
intervention in moving code or operating the 
target machine where the code will run. 


c) Model Validation: Use of various measures of 
statistical validity to determine data or model 
problems. 


d) Big Data Algorithms: Design and development 
of algorithms to access large amounts of data 
from large databases through queries and derive 
streaming and real-time analysis from them. 


e) Machine Learning: Automatic development of 
models based on training data as well as back- 
propagation, or feedback loops enabling the ability 
to test and retrain the model while processing 
production data. 

f) Statistical Learning: Prediction of business metrics 
and variables for the future based on historical 
data. 


4.2.11 Business Intelligence 


Cities shall be able to transform their decision making 
by utilizing fact based, quality information. They should 
be able to track KPIs and metrics in an intuitive manner 
and by manipulating data as well as identifying trends 
and outliers. Business Intelligence (BI) platform/tools 
simplify this process of data manipulation, allowing 
users to access, navigate, analyze, format, and share 
information across different environments. It allows a 
wide range of processes, from search and navigation 
to advanced analytics, enterprise query, reporting and 
analysis, dashboards and visualization, self-service 
access to relevant information and information 
infrastructure management. BI systems are designed 
for high performance across a broad spectrum of user 
and deployment scenarios. 


4.2.12 Data Discovery and Mining 


This pertains to Data synthesis and visualization 
methods in preparation of advanced data modelling 
activities. 
a) Analysis of data sets through visual and graphical 
methods to summarize their main characteristics. 
b) Development and management of workflows to 
conduct analytics on data. 
c) Registry of information for reusable components 
of many types, which are used to build, document 
and test data mining tools. 


5 FUNCTIONAL 
REFERENCE MODEL 


AND TECHNOLOGY 


The purpose of creating a Unified Data Layer 
Reference Architecture is to model data as a shared 
layer across all smart city component systems with 
‘shared data semantics’ and ‘normalized data models’, 
thereby avoiding data silos, data interoperability and 
interpretability issues. 


From an architectural viewpoint, the core functions of 
the Data Layer are depicted in Fig. 3. 


The functional reference model provides a canonical 
view to the smart city stakeholders about the 
responsibilities of the smart city data architecture 
framework. There are four core functions in the data 
architecture framework namely - Data Integration, 
Data Processing, Data Storage and Data Exchange. 


The functional and technology reference model 
implemented in a city shall follow the Unified Digital 
infrastructure (UDI) properties defined in IS 18000. 


From the functional reference model, a set of technology 
solutions are mapped to the core function which serves 
as the Technology reference model (depicted in Fig. 4) 
in the Unified Smart City Data Architecture Framework. 
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Component solutions of the Technology Reference 
Model and their mapping to the functional reference 
model is provided in Table 3. 


Table 3 Technology Component Mapping 
to the Functional Area 


SI Technology Component Functional Area 
No. 
(1) Q) (3) 


i  IOT Message Hub 
ii) Enterprise Message Bus 
ii) ETL tools 


Data Integration 
Data Integration 


Data Integration and Data 
Processing 


iv) Data Lake 
v) Data Warehouse 
vi) Data API 


vi) Data Knowledge Archive 
Network 


Data Storage 
Data Storage 
Data Exchange 
Data Exchange 


vi) Complex Event Processing Data Processing-Enrich, Act 


vii Workflow Engine/Rule 
Engine 

vii) Big Data Analytics 
[AL ML] 


Data Governance 


Data Processing-Enrich, Act 
Data Processing-Enrich 


viii) Policy Driven Control 


5.1 Data Integration 


Data Integration is the core function of each smart city 
system which acquires data from source systems such 
as sensors, various governmental IT systems, social 
platforms and 3“ party IT systems. 


Data is generated by various IoT subsystems, video 
cameras from various public/private cameras etc, 
social media data streams, crowd-sourced citizen 
data, enterprise systems, importing of legacy/archived 
city data etc. These systems generate structured data 
(JSON, XML, CSV), unstructured data (variable length 
textual data — device logs), audio and video data, time 
series data and binary data. 


Data Ingestion is a sub function of data integration 
and shall support a variety of standards compliant data 
ingestion protocols. 


Data Integration function shall also support a 
publish/subscribe (asynchronous) or request/response 
(synchronous) based message exchange through a 
message broker to send commands to the connected 
edge nodes or provide updates to the connected 
IT applications which are publishing the data to the 
smart city system. 


5.1.1 JOT Message Hub 


Key design goal of an IOT Message Hub is to provide 
a publish-subscribe messaging infrastructure for the 
smart devices deployed in the city to connect & ingest 
data once they are on boarded into the smart city 
platform. Devices will ingest messages by publishing 


to the message hub and subscribe to push notifications 
for performing the actuation. 


As the devices deployed in the field vary from simple 
devices like a temperature sensor to a complex device 
like an autonomous vehicle, message hubs need to 
support ingesting data of different types with different 
latency and throughput requirements. Message hubs 
shall deploy multiple message bus with multiple 
endpoints to support various data types and formats, for 
example — one message bus for time series telemetry 
data and another for video data with multiple endpoints 
based on the number of sensors for ingesting data. 


Different devices have different power profiles, a city 
may need to deploy battery operated devices as well 
as always powered on devices. Devices that are battery 
operated may only publish data and may not support 
actuation. These differences in device profiles and their 
power saving strategies mean that IoT message hubs 
shall support a variety of QoS profiles and also in turn 
support different messaging protocols like MQTT, 
AMOP, HTTP, Web sockets and more, suitable for use 
case specific QoS. 


5.1.2 Enterprise Service Bus 


Enterprise Service Bus (ESB) is a middleware tool 
used in distributed computing and helps in component 
integration by providing ability to connect to a message 
bus and allowing components to publish-subscribe 
messages based on simple structural and business 
policy rules. 


ESB’s allow for integration through message structures 
and reduce point to point integration between smart city 
component systems thereby reducing the deployment 
and integration complexity. Messages can be routed 
between application components based on message 
contents and implementation or business policies. 


Smart cities manage a wide variety of component 
systems like city governance platforms, city digital twin 
platforms, city social platforms and other enterprise 
platforms hence it 1s important for the cities to deploy 
an ESB to interconnect these enterprise solutions into 
a unified data platform to mitigate the 1ssue of point 
to point connectivity between components leading to 
service management complexities. 


5.1.3 Extract-Transform-Load (ETL) Tools 


ETL tools shall be part ofthe smart city data architecture 
framework as they help in bringing data from various 
sources including legacy data sources into a common 
data warehouse. ETL tools provide a rich set of 
functional libraries for various transformations helping 
in transforming data according to schema before they 
are ingested into the data storage. ETL transformations 
make it easier to cleanse and integrate enterprise data 
into the common storage, thereby making it available to 
meet business needs. 


ETL tools through their support for various 
transformations provide cities with the capability 
to manage the quality of data. A robust data quality 
framework includes the measurement, analysis, 
cleansing, enhancement, matching, and consolidation 
of data. 


Using ETL tools and data dashboards to create 
intuitive interface cities can enable business users to 
perform data profiling, metadata management, and 
data quality monitoring with minimal training. Data 
quality dashboards let users continuously monitor their 
information, so data is more likely to stay clean. 


5.2 Data Processing 


Data Processing pertains to the processing of data in 
real-time or batches. Batching can be done to process 
data through the execution of a series of programs 
without manual intervention on a scheduled basis. 
Streaming 1s the presentation of ingested data that is 
being received and analyzed continuously to generate 
insights presented through operational dashboards. 


Data transformation includes a certain set of activities 
like conversion of data from one format to another, 
enrich data by merging data from multiple sources, 
perform aggregation function i.e. create a summary of 
data (example could be creating a total revenue earned 
from a citizen across multiple services offered by city), 
or cleanse data of null values. 


Data transformation can be carried out on data either 
in batch mode, where a bulk of data is processed and 
transformed at a regular interval (typically done once 
a day for large sets of structured data from various city 
application like e-governance, solid waste management, 
parking, health management, traffic management, etc) 
or it can be done on streaming data in real time (mostly 
used for IoT data). 


5.2.1 Complex Event Processing (CEP) 


Realtime intelligence and situation awareness are 
critical enablers of smart cities. This would need real 
time event processing. The responsibility of Complex 
event processing (CEP) includes processing streams of 
data against defined rules to derive real time actionable 
insights. 


CEP also helps in data transformation, where systems 
look at incoming real time data and perform actions 
based on the data. Like rule engines and workflow 
systems, CEP uses the state of the entity represented 
by the data to trigger and assign actions to participating 
systems and humans. To process and trigger 
downstream actions CEP, rule engines and workflow 
systems require data to follow a well-defined schema. 


5.2.2 Data Virtualization 


Storing all data would mean duplicating data across 
various systems and would mean a constant need for 
synchronizing updates between the systems impacting 
the quality of data. 
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Data virtualization will help to access and use the data 
through the data exchange layer without the need to 
store the data, knowing data formats or the physical 
location of the data. This would facilitate a unified 
channel for all data exchange by removing point to 
point integrations. Data virtualization also enable real 
time and near real time data flow by removing multiple 
data hops with respect to storage. 


Data virtualization also eliminates the need for 
enforcing storage level policies across many systems 
for the same data. 


5.2.3 Rule Engine/Workflow Engine 


There are many smart city usecases in which based on 
the ingested data some rule needs to be executed for 
actuation or real time operations alerts. At the simplest 
level, it may be a threshold-based rule and in the most 
complex scenario, it may be a collection of resources, 
assets or telemetry data aggregated over a time/space 
window triggering a set of events that manipulate the 
state of the resource in the system. 


Workflow should provide the ability to the city 
operations to configure these rules or event execution 
flows based on the data that is ingested into the smart 
city component system. 


5.2.4 Big Data Analytics 


Big dataanalytics is about utilizing the cost effectiveness 
of moving compute to the data instead of moving data 
to the compute. Smart cities may have a large corpus 
of audio and/or video data or events data which can be 
useful for performing big data analytics. 


Application developers can build data analytics 
solutions using technologies such as AI techniques 
such as deep learning, machine learning etc. While they 
can test their solution using the data exported via the 
DKAN portal, they must deploy their applications in 
the smart city cloud or on-premise environment in a 
production setting. 


Smart city data platforms shall provide the ability for 
the developers to containerize the applications and 
execute them closer to the data within a secure enclave. 
Smart cities shall leverage technologies such as 
CRI (Container Runtime Interface) & FaaS 
(Function-as-a-Service) being developed by various 
industry standards bodies to  decouple system 
dependencies and to have the ability to independently 
upgrade systems. 


5.3 Data Storage 


Smart city component systems need to manage different 
data types that will be acquired, leading to multiple 
data storage systems. Data layer shall be composed 
of a polyglot of storage systems made up of NoSQL 
(DocumentDB, Time Series DB, Columnar DB, Graph 
DB), Big Data, OLTP and OLAP technologies (Data 
Warehouse), Object Storage and In-memory databases. 
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It is important to note that one data storage mechanism 
Is not a replacement for other and different databases 
are useful in performing different types of specialized 
storage and querying of data. 


5.3.1 Data Lake 


Data lake 1s a set of data storage technology solutions 
that the city shall deploy to support different storage 
formats, different means of querying for different 
content-types. Standard OLTP database technologies 
may not suffice due to the nature of the data being 
ingested in the smart city environment. 


While the sensor data could be either stored in a time 
series or a document database, incidents generated 
out of the events may need to be stored in a regular 
SQL like database. 


In the above architecture, some of the key storage 
technologies (Object Storage, NoSQL, SQL, Big 
Data, File system) are mentioned for representational 
purposes. 


During the implementation, it is required to map the 
different data types to the right storage solution with the 
right set of life cycle, access and governance policies. 


5.3.2 Data Warehouse 


An Enterprise data warehouse may be used for storing 
integrated and processed data as well as processing 
large, complex queries and advanced analytics to 
provide the insights for data-based decision making. 
The warehouse can act as the single source for 
self-service BI and analytics and will provide 
long-term trends, patterns as well as predictive insights 
for more effective planning and optimization of city 
operations. 


The data warehouse platform delivers fast query 
performance and storage efficiency for structured data 
through a native columnar architecture providing speed 
and agility with low total cost of ownership. Column 
based storage makes it easy to execute operations in 
parallel using multiple processor cores thus providing 
massive parallelization in Query Execution. In addition 
to the above, it provides extremely strong support 
for complex predictive analytic methods through 
in-database analytics, handling massive data sets for 
better modelling accuracy, and the ability to work with 
leading analytics and visualization tools. Further, the 
data warehouse provides the following benefits: 


a) Faster results for report acceleration, data mining, 
and predictive analytics. 


b) Superior price-performance ratio. 

c) Open, flexible column-oriented architecture. 

d) Data compression to reduce the data size. 

e) Support in-database analytics. 

f) Support for storage & processing of spatial data. 
g) Support for time series data. 
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5.4 Data Exchange 
5.4.1 Data APIs 


API stands for application programming interface. 
Data APIs are a set of read and annotate API’s that 
shall be used by the developer ecosystem to develop 
interesting smart city applications utilizing the data 
available within the city. 


Data API’s provide a standardized entry point for 
developers to access the data from the data repositories. 
API’s allow for fine grained access to the data according 
to the data governance rules defined and allow smart 
cities to meter and monetize data. Data API’s are useful 
for accessing data representations across time periods 
to make operational decisions. Data API’s can also 
be used to annotate raw data with more meaningful 
information. 


Data API’s are suitable for accessing real time and small 
batches of data from the data lake. The intent of the 
API’s is not to retrieve large chunks of data like video 
data over the network. Moving data over the network 
will cost money for the city, hence deploying an API 
management layer along with API metering shall be 
implemented to monitor and manage the network costs. 
API management may also serve as a first level policy 
and governance enforcement layer. 


Following are some of the API’s that a Data service 
may provide. 


a) Search/query Data via API. 


b) API shall support access to data in multiple 
formats (For e.g.: CSV, XML, JSON, PDF). 


c) Dataset additions and updates via API (for data 
providers/publishers). 


d) Update metadata via API (for data providers). 


e) Subscribe for real time push/pull notification for 
certain types of data. 


If a city wants to allow data access for analytics, 
requiring large quantities of data movement over the 
network may be required, the cost effective model will 
be to bring the compute to the data by wrapping the 
application in a container instead of moving data to the 
application through the network utilizing Data API’s. 


5.4.2 Data Portal 


One of the key goals of smart cities mission is to 
promote transparency and greater citizen engagement 
by making more government data, documents, tools, 
and processes publicly available through a freely 
available, open source platform. 


India as a nation along with Canada and the United 
States has adopted an open source web-based data 
sharing platform called Open Government Platform 
(OGPL). There are also other open source data 
management system initiatives such as Comprehensive 
Knowledge Archive Network (CKAN) which is a 


web-based open source management system for the 
storage and distribution of open data. 


5.4.3 Data Catalogue 


Smart city component systems manage data and have 
data schemas that provide the structure of the data 
and the relationship between various entities in the 
database. Data models/schema should be documented, 
and version controlled, so that changes to the schema 
are controlled and help in seamless migration as and 
when new source systems or requirements are added. 


These data models and schemas shall be published 
in a common city level repository. Every source 
application should utilize domain level standard data 
models applicable for their respective domains like 
Waste management, Traffic management, Parking 
management etc. 


Application that hosts schema information about data is 
referred to as a Data catalogue. One of the key design 
goals of the data catalogue is that both humans and 
computers should be able to interpret the schema for 
making decisions. 


Data catalogue shall allow for easy search of data 
schemas based on keywords, tags and various 
contextual information (one example of such effort is 
hypercat). Search API’s allow for discoverability of 
the underlying data models a city data lake repository 
contains and helps in app providers developing 
interesting mashup applications. 


5.4.4 Data Knowledge Archive Network 


As part of the transparency efforts & good governance 
practices, governments are sharing data through open 
data platforms like CKAN, DKAN and OGPL. Data is 
shared in royalty free and through open license models 
to enhance the consumption of data. However, this 
platform can also be used to monetize data or conduct 
challenges based on the data published. 


Cities should constantly identify datasets to be exported 
from their data lake along with their purpose, goal, 


IS 18002 (Part 1) : 2021 


duration, licensing and monetization model. A set of 
key tools like data anonymization, compression etc, 
should be part of the open data platform which will be 
used while exporting the data based on the data privacy, 
security and cost requirements. 


6 INFORMATION REFERENCE MODEL 


The information reference model as depicted in 
Figure 5 consists of the following conceptual models: 


a) Data Reference Model: which models data. 


b) Data Access Model: which models the access to 
data. 


c) Information Extraction Model: which models the 
process of information extraction from data. 


d) Insights Extraction Model: which models the 
process of insights extraction from data. 


These models can be extended/instantiated to create 
concrete models for any use case. 


6.1 Data Access Model 


Accessing data by any stakeholder involves interacting 
with a Data Exchange interface to connect with Data 
Sets and then access either the Data Set as a whole or 
get specific data items from the set through a selection, 
or querying process. The data access model is shown 
in Figure 6. 


The Data Set itself gets created by a Data Collection 
Process, which could be entirely automated or could 
have human steps. 


Each Data Set has a clearly designated owner (Data 
Owner), who is responsible for the Data Set in terms of 
providing permissions for sharing/exchanging and sets 
the policies for the same. 


The process of managing the data collection, its 
curation, normalization/transformation, storage and 
any other aspects is called Data Governance and 
Management and is performed by a Data Steward-who 
is a specialist in these aspects. 


Information 
Reference Model 


Data Access Model Data Reference 


Model 


Information 
Extraction Model 


Insights Extraction 
Model 


Fic. 5 INFORMATION REFERENCE MODEL 


IS 18002 (Part 1) : 2021 


Obtain Data © 


Stakeholder $ 


Initiates 


MakesAvailable 
v 
DataSet 


Impacts 


^ 


Produces Impacts 


Data 
Collection 
Process 


© 


Data Steward 2 


Handles 


Governance 


Participates In 


Data Owner g 


Fic. 6 Data Access MODEL 


Example: 
Fault in an environmental sensor triggers an alert 
for the field agent. 
Data Owner: City SPV 


Data Collection Process: IOT 
continuous monitoring of sensor status 


enabled 


Data Set: Event stream of sensor status 
Stakeholder: Maintenance Supervisor 


Solution: Field Service App, that receives 
service management requests through API 
from ICCC Application, subscribes to the event 
stream of environmental sensor application, 
which will trigger an alert to the nearest 
available field agent. 


Data Steward: Shall ensure that this event 
stream data is made available to the appropriate 
users. 


Data Governance: Shall ensure that the City 
SPV can set appropriate access permissions 
for sharing of this event stream and to ensure 
timely action. 


6.2 Data Reference Model 


Data is the digital representation about a Thing. 
Thing is a universal item-anything that is measurable, 
describable or digitally representable, is the subject for 
data. 
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Data is contained in a Dataset, which is an asset and 
unit for management and governance. 


Data can be structured (1.e., organized into fields like 
in an Excel file), unstructured (e.g., text document) or 
semi-structured (e.g., video streams). 


Typically, structured data has names for the fields 
(commonly called attributes), and these attributes map 
to the attributes of the Thing. In this document, we 
represent even unstructured data in this format, wherein 
the unstructured data has only one attribute (which we 
can call data value, for example, with the content of 
the data value field being the entire unstructured data). 


A Data Model as depicted in Fig. 7 describes these 
attributes and their properties (as far as pertaining to 
their digital values). A basic Data Model might stop 
at listing out the attribute names and the datatypes 
of the values of the attributes (e.g. string, Boolean, 
integer etc). A Data Model is an essential ingredient of 
the meta-data (data about the data). Various standard 
formats exist for describing meta-data. 


A more extensive Data model might draw or connect the 
attribute names to Vocabularies - which are commonly 
understood by practitioners in the domain. Further 
organization of parts of Vocabularies as Taxonomies 
(.e., hierarchical or structured arrangement of terms) 
could provide additional interpretability to the data. A 
vocabulary might also be organized or structured as an 
Ontology, which captures the meaning (semantics) of 
the vocabulary in terms of the meaning of the actual 
Thing, its properties, and its relationships. 
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An abstract model is shown in Fig. 8. Specific meta 
data models, vocabularies, taxonomies and ontologies 
should be developed for specific domains and 
standardized, to allow easy sharing and interpretation 
of data across different stakeholders. 


6.3 Information Extraction Model 


Fig. 9 depicts Information Extraction Model. 
"Information is defined as data that are endowed with 
meaning and purpose". Data needs to be rendered in 
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an appropriate context, in a meaningful way, to make 
sense to the stakeholder. 


A part of the context is provided by the question being 
asked by the seeker. 


For example, the question “When is bus 276 scheduled 
to arrive at “Mantri Mall’ stop?" provides the context 
as "The bus 276 will arrive at". In data from the bus 
schedule, “10.30” is interpreted using the metadata 
(*description: time, format: UTC”), and is then rendered 
to provide useful information to the stakeholder. 
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Data Context is itself data and can also be supplied Example: 


with data in general. For example, the query “show me 
where the current accident is" is rendered by showing 
the accident spot (which 1s geolocation data), rendered 
on a map of the locality (which is GIS data). 


6.4 Insights Extraction Model 


Insights require deeper analysis of data and its context 
and will usually involve an interactive process of 
working with data, by a human expert. These also form 
the basis for creating new analytics as well as predictive 
models. Once the algorithms for prediction/analytics 
is finalized, it can be deployed as per the information 
extraction model. Insights Extraction Model is shown 
in Fig. 10. 


While the foundation of data, meta-data and data context 
remains the same, the process of extracting insights 1s 
more involved than that for extracting information. 
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Understanding the spread of an epidemic 


To be able to understand how an epidemic 
spread, one needs to analyze multiple datasets 
together. These include a) symptoms data set 
as reported by health apps, b) test reports by 
testing labs, c) sales of specific medicines as 
reported by pharmacies d) patients admitted 
with specific symptoms as reported by hospitals 
and e) mobility data as extracted from telecom 
operators and social media companies. A joint 
analysis of all these (and perhaps more) data 
sets is required to develop an understanding of 
the spatio-temporal spread of the epidemic. This 
is an iterative process requiring a continuous 
back and forth interaction between the human 
analysts and the data sets, until hypotheses 
are validated. Once the insights are obtained, 
these can be converted into predictive models, 
which can then be deployed to raise alerts about 
emerging hotspots. 
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ANNEX A 
( Clause 4.2.9 ) 
MASTER DATA MANAGEMENT REFERENCE IMPLEMENTATION GUIDE 


A-1 NEED 
MANAGEMENT 


FOR MASTER DATA 


In the context of a Smart city, the master data reveals 
itself in many forms such as municipal wards, 
locations, addresses, household details, etc. and 
these data come from various sources and are used in 
different contexts in different solutions. For example, 
in Property tax solution, a house address is used for 
identifying the property to be taxed, whereas the same 
house address in a Solid Waste Management solution is 
used for identifying the house for door-to-door garbage 
collection. Irrespective, the need for both the data to 
represent the same physical (or logical) entity is of 
at most importance. In this context, it is important to 
make sure that there is no duplication of such data and 
one source acts as the “master” or “golden” source of 
truth. In the example given above, to avoid duplication, 
it is highly recommended that the Property tax system 
is used as the golden source to update the household 
data such as location, address etc. into the master data 
repository and store the latitude/longitude data in GIS 
from which the solid waste management system uses 
the data, instead of capturing them on its own. It may 
use this “master” data to validate and cross verify on 
the ground and if required request the "master" to 
correct the data. Correction may be required if there 
are ownership changes to the property or errors in the 
provided master data. 


In this reference guide to the Data Layer Reference 
Architecture, we provide a guideline to Smart cities on 
how to identify the solutions that can potentially act as 
the “master” to represent a certain physical or logical 
entity and thus make that solution the *golden" source 
of truth for that type of data. 


One of the biggest challenges is the process of 
identifying and labelling a data element as “master” 
data, especially in municipal administration 
where multiple systems exist, mostly in silos. A 
high-level process for identifying master data for a city 
is described in A-2. The City Data Officers (CDO) and 
their virtual organization play a key role in executing 
this process of identifying master data. While the 
high-level process is a guideline, Smart cities are free 
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to select a methodology that is most suited to their 
purposes as long as they are able to identify and label 
the master data elements and its sources. 


A-2 HIGH-LEVEL PROCESS FOR 
IDENTIFYING MASTER DATA FOR A CITY 


Stages for identifying master data 


The recommended high level process for identifying 
the master data involves the following three stages: 


a) Stage I - the CDO defines and explains to all 
concerned on what is City level master data, its 
importance and sets up a process to collaborate 
with the departments to identify master data within 
each department. In this stage, the CDO also 
defines the criteria for the master data sources to 
be identified as the “golden source". For example, 
if there are two sources for residential addresses, 
this criterion will determine which source should 
be considered the “golden source”. 


b 


wm 


Stage II - the departmental level master data thus 
collated is normalized across all departments, 
duplicates identified, disputes resolved, and one 
source is made the “golden source” for that data 
element. For example, if two departments are the 
sources of residential address data independent 
of each other, then one of the departments will 
be identified as the “golden source” based on the 
criteria that is defined in Stage I. 


Cc 


— 


Stage III - the City level master data is published 
along with its “golden sources" and the CDO 
defines the processes for on-going master data 
management. 

NOTE — It is not necessary that all the “master” data must 
come from a single department or source. Indeed, it is a union 
of data from multiple sources that make up the “Master data" 
for a Smart city as given in the definition. 

Table 4 provides a set of identified solutions that can 
act as potential “golden source" of truth for certain data 
based on generally accepted best practices.Smart cities 
are free to identify and designate a different source for 
the same. This guideline does not define how the data 
will be derived or calculated. 


Defining 
master data at 


a City level 


Departmental 
level master 
data 
identification 


Solution 
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Table 4 Solutions and Master Data Sources 


Master Source for Data Item 


Geographical Information 
System (GIS) 


Asset Management 


Transit Management 


Solid Waste Management 


b) 
c) 


d 


— 


Geo spatial data such as latitude/longitude. 

Distance between any two points within an urban local body. 
Municipal boundaries such as wards and zones. 

Points of interest. 

Area of a location. 

Smart city assets such as field level devices. 


Any other data related to or owned by the urban local body such as water pipelines, electrical 
network, sewage network, etc. 


Field Asset details. 


Multi-modal transport information. 

Fare details. 

Transport routes. 

Transport vehicle information: 

1) Number and type of vehicles. 

ii) Daily routes. 

iii) Depot locations. 

Daily details of vehicle movement. 

Real-time passenger transport vehicle location. 
Passenger information such as ETA/ETD. 
Transit information within a city such as point-to-point fastest modes, shortest routes, etc. 
Transactional and financial data related to Transit management. 
Garbage collection information within the city: 
i) Door-to-door. 

ii) Overall. 

iii) Recycling data. 

Health worker information. 

SWM vehicular information: 

iv) Number and type of vehicles. 

v) Daily routes. 

vi) Depot locations. 

vii) Daily details of vehicle movement. 


Real-time SWM vehicle location. 
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Table 4 ( Concluded) 
Solution Master Source for Data Item 
Intelligent Traffic e) Vehicular movement information such as no. of vehicles entering/exiting a city, average speed of 
Management vehicles, etc. 


f) Traffic violation and e-Challan data: 
1) Red light violations. 
ii) Speed violations. 
iii) Number plate recognition. 
iv) Stolen vehicle identification. 


Traffic enforcement information: 


— 


g 
1) Traffic signal status. 
ii) Traffic movement at various locations. 
iii)Congestion information at various locations. 


h) Traffic Accident related information. 


— 


Variable Message Display a) Status information. 
b 


c 


— 


Current and historical messages displayed. 


— 


Revenue from advertisement displays. 
d) Special message displays: 
1) Emergency messages. 


ii) Special day greetings. 


Video Surveillance a) Streaming video from a particular location. 

b) Video Analytics event specific data (Depends on the capability of the Video Analytics system). 
Environmental Monitoring a) Allcity specific environmental pollution data 
e-Governance a) Property Tax related data: 


1) Property type (Residential, Commercial, etc). 
ii) Property number. 
iii) Address. 
iv) Property rate. 
v) Property payment information. 
b) Property information. 
1) Physical characteristics of the property such as built up area, no. of tenements, no. of floors, etc. 
c) Water tax related data. 
d) Municipal financial data. 
e) Municipal procurement information. 
f) Municipal ERP related data. 


g) HR records of municipal workers. 
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